The Multi-State Perfect Phylogeny Problem with Missing and Removable Data: Solutions via Integer-Programming and Chordal Graph Theory

نویسنده

  • Dan Gusfield
چکیده

The Multi-State Perfect Phylogeny Problem is an extension of the Binary Perfect Phylogeny Problem, allowing characters to take on more than two states. In this article, we consider three problems that extend the utility of the multi-state perfect phylogeny model: (1) the Missing Data (MD) Problem, where some entries in the input are missing and the question is whether (bounded) values for the missing data can be imputed so that the resulting data has a multi-state perfect phylogeny; (2) the Character-Removal (CR) Problem, where we want to minimize the number of characters to remove from the data so that the resulting data has a multi-state perfect phylogeny; and (3) the Missing-Data Character-Removal (MDCR) Problem, where the input has missing data and we want to impute values for the missing data to minimize the solution to the resulting Character-Removal Problem. We discuss Integer Linear Programming (ILP) solutions to these problems for the special case of three, four, and five permitted states per character, and we report on extensive empirical testing of these solutions. Then we develop a general theory to solve the MD problem for an arbitrary number of permitted states, using chordal graph theory and results on minimal triangulation of non-chordal graphs. This establishes new necessary and sufficient conditions for the existence of a perfect phylogeny with (or without) missing data. We implement the general theory using integer linear programming, although other optimization methods are possible. We extensively explore the empirical behavior of the general solution, showing that the methods are very practical for data of size and complexity that is characteristic of many current applications in phylogenetics. Some of the empirical results for the MD problem with an arbitrary number of permitted states are very surprising, suggesting the existence of additional combinatorial structure in multi-state perfect phylogenies. Finally, we note some relationships between our chordal-graph approach to the multi-state perfect phylogeny, without missing data, and prior methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Potential Maximal Clique Algorithms for Perfect Phylogeny Problems

Kloks, Kratsch, and Spinrad showed how treewidth and minimum-fill, NP-hard combinatorial optimization problems related to minimal triangulations, are broken into subproblems by block subgraphs defined by minimal separators. These ideas were expanded on by Bouchitté and Todinca, who used potential maximal cliques to solve these problems using a dynamic programming approach in time polynomial in ...

متن کامل

Flow Shop Scheduling Problem with Missing Operations: Genetic Algorithm and Tabu Search

Flow shop scheduling problem with missing operations is studied in this paper. Missing operations assumption refers to the fact that at least one job does not visit one machine in the production process. A mixed-binary integer programming model has been presented for this problem to minimize the makespan. The genetic algorithm (GA) and tabu search (TS) are used to deal with the optimization...

متن کامل

An L1-norm method for generating all of efficient solutions of multi-objective integer linear programming problem

This paper extends the proposed method by Jahanshahloo et al. (2004) (a method for generating all the efficient solutions of a 0–1 multi-objective linear programming problem, Asia-Pacific Journal of Operational Research). This paper considers the recession direction for a multi-objective integer linear programming (MOILP) problem and presents necessary and sufficient conditions to have unbounde...

متن کامل

Well-dispersed subsets of non-dominated solutions for MOMILP ‎problem

This paper uses the weighted L$_1-$norm to propose an algorithm for finding a well-dispersed subset of non-dominated solutions of multiple objective mixed integer linear programming problem. When all variables are integer it finds the whole set of efficient solutions. In each iteration of the proposed method only a mixed integer linear programming problem is solved and its optimal solutions gen...

متن کامل

Unique Perfect Phylogeny Characterizations via Uniquely Representable Chordal Graphs

The perfect phylogeny problem is a classic problem in computational biology, where we seek an unrooted phylogeny that is compatible with a set of qualitative characters. Such a tree exists precisely when an intersection graph associated with the character set, called the partition intersection graph, can be triangulated using a restricted set of fill edges. Semple and Steel used the partition i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of computational biology : a journal of computational molecular cell biology

دوره 17 3  شماره 

صفحات  -

تاریخ انتشار 2009